channel-wise attention
- North America > Canada (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Controllable diffusion-based generation for multi-channel biological data
Zhang, Haoran, Zhou, Mingyuan, Tansey, Wesley
Spatial profiling technologies in biology, such as imaging mass cytometry (IMC) and spatial transcriptomics (ST), generate high-dimensional, multi-channel data with strong spatial alignment and complex inter-channel relationships. Generative modeling of such data requires jointly capturing intra- and inter-channel structure, while also generalizing across arbitrary combinations of observed and missing channels for practical application. Existing diffusion-based models generally assume low-dimensional inputs (e.g., RGB images) and rely on simple conditioning mechanisms that break spatial correspondence and ignore inter-channel dependencies. This work proposes a unified diffusion framework for controllable generation over structured and spatial biological data. Our model contains two key innovations: (1) a hierarchical feature injection mechanism that enables multi-resolution conditioning on spatially aligned channels, and (2) a combination of latent-space and output-space channel-wise attention to capture inter-channel relationships. To support flexible conditioning and generalization to arbitrary subsets of observed channels, we train the model using a random masking strategy, enabling it to reconstruct missing channels from any combination of inputs. We demonstrate state-of-the-art performance across both spatial and non-spatial prediction tasks, including protein imputation in IMC and gene-to-protein prediction in single-cell datasets, and show strong generalization to unseen conditional configurations.
- North America > United States > Texas > Travis County > Austin (0.14)
- Asia > Middle East > Jordan (0.04)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.89)
- Health & Medicine > Therapeutic Area > Oncology (0.69)
Reviews: Controllable Text-to-Image Generation
The paper is well-organized and written, which can be followed easily. In particular, instead of generating a new image from the text, the authors pay more attention to image manipulation based on the modified natural language description. For the word-level spatial and channel-wise attention driven generator: (1) The novelty and effectiveness of attentional generator may be limited. Specifically, the paper designs a word-level spatial and channel-wise attention driven generator, which has two attention parts (i.e. However, since the spatial attention is based on the method in AttnGAN [7], most contributions may lie on the additional channel-wise part.
Improving Transformer-based Networks With Locality For Automatic Speaker Verification
Sang, Mufan, Zhao, Yong, Liu, Gang, Hansen, John H. L., Wu, Jian
Recently, Transformer-based architectures have been explored for speaker embedding extraction. Although the Transformer employs the self-attention mechanism to efficiently model the global interaction between token embeddings, it is inadequate for capturing short-range local context, which is essential for the accurate extraction of speaker information. In this study, we enhance the Transformer with the enhanced locality modeling in two directions. First, we propose the Locality-Enhanced Conformer (LE-Confomer) by introducing depth-wise convolution and channel-wise attention into the Conformer blocks. Second, we present the Speaker Swin Transformer (SST) by adapting the Swin Transformer, originally proposed for vision tasks, into speaker embedding network. We evaluate the proposed approaches on the VoxCeleb datasets and a large-scale Microsoft internal multilingual (MS-internal) dataset. The proposed models achieve 0.75% EER on VoxCeleb 1 test set, outperforming the previously proposed Transformer-based models and CNN-based models, such as ResNet34 and ECAPA-TDNN. When trained on the MS-internal dataset, the proposed models achieve promising results with 14.6% relative reduction in EER over the Res2Net50 model.
- North America > United States > Washington > King County > Redmond (0.04)
- North America > United States > Texas > Dallas County > Dallas (0.04)
- Asia > India (0.04)
PDANet: Polarity-consistent Deep Attention Network for Fine-grained Visual Emotion Regression
Zhao, Sicheng, Jia, Zizhou, Chen, Hui, Li, Leida, Ding, Guiguang, Keutzer, Kurt
Existing methods on visual emotion analysis mainly focus on coarse-grained emotion classification, i.e. assigning an image with a dominant discrete emotion category. However, these methods cannot well reflect the complexity and subtlety of emotions. In this paper, we study the fine-grained regression problem of visual emotions based on convolutional neural networks (CNNs). Specifically, we develop a Polarity-consistent Deep Attention Network (PDANet), a novel network architecture that integrates attention into a CNN with an emotion polarity constraint. First, we propose to incorporate both spatial and channel-wise attentions into a CNN for visual emotion regression, which jointly considers the local spatial connectivity patterns along each channel and the interdependency between different channels. Second, we design a novel regression loss, i.e. polarity-consistent regression (PCR) loss, based on the weakly supervised emotion polarity to guide the attention generation. By optimizing the PCR loss, PDANet can generate a polarity preserved attention map and thus improve the emotion regression performance. Extensive experiments are conducted on the IAPS, NAPS, and EMOTIC datasets, and the results demonstrate that the proposed PDANet outperforms the state-of-the-art approaches by a large margin for fine-grained visual emotion regression. Our source code is released at: https://github.com/ZizhouJia/PDANet.
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur > Alpes-Maritimes > Nice (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > China > Jiangsu Province (0.04)